Let’s meet a dataset:
data
## # A tibble: 142 x 3
## dataset x y
## <int> <dbl> <dbl>
## 1 4 55.4 97.2
## 2 4 51.5 96.0
## 3 4 46.2 94.5
## 4 4 42.8 91.4
## 5 4 40.8 88.3
## 6 4 38.7 84.9
## 7 4 35.6 79.9
## 8 4 33.1 77.6
## 9 4 29.0 74.5
## 10 4 26.2 71.4
## # ... with 132 more rows
Let’s get an idea of its structure:
summary(data)
## dataset x y
## Min. :4 Min. :22.31 Min. : 2.949
## 1st Qu.:4 1st Qu.:44.10 1st Qu.:25.288
## Median :4 Median :53.33 Median :46.026
## Mean :4 Mean :54.26 Mean :47.832
## 3rd Qu.:4 3rd Qu.:64.74 3rd Qu.:68.526
## Max. :4 Max. :98.21 Max. :99.487
One little scatterplot:
Some more examples of why data visualisation is so valuable:
The datasaur dozen group of 12 two-dimensional datasets that have identical:
But…
Â
Â
Â
Â
Â
Â
Â
Â
In general there are two purposes behind data visualisations:
Â
Â
In both cases, successful visualisation will illuminate and aid understanding, not confuse or obscure.
Â
Â
Â
Â
Â
Â
Â
Â
Â
What is the difference between panel E and the other panels in the following figure?
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
Â
Compare the two plots above - How are the data elements linked to the visual elements? - What is the difference between the two?
Â
Â
Â
Compare the amount of ink used with the amount of information communicated. A high ink:information can indicate a plot that hasn’t been thought through, and is often distracting or confusing.
These plots often contain ‘chart junk’ (Edward Tufte).
Â
Compare the above figures. - Which elements are removed in the left version? - Which elements could still be removed? What is their purpose?
Â
Â
Â
When there are too many data points…
Â
Â
Â
Â
Â
Â
Â
Â
Â